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Abstract 

Nonlinear  regression-adjusted  control  variables  are  investigated  for  improving  variance 
reduction  in  statistical  and  system  simulations.  To  this  end,  simple  control  variables  are 
piecewise  sectioned  and  then  transformed  using  linear  and  nonlinear  transformations.  Opti¬ 
mal  parameters  of  these  transformations  are  selected  using  linear  or  nonlinear  ieast-squares 
regression  algorithms.  As  an  example,  piecewise  power- transformed  variables  are  used  in  the 
estimation  of  the  mean  for  the  two- variable  Anderson-Darling  goodness-of-fit  statistic  H  f 
Substantial  variance  reduction  over  straightforward  controls  is  obtained.  These  parametric 
transformations  are  compared  against  optimal,  additive  nonparametric  transformations  ob¬ 
tained  by  using  the  ACE  algorithm  and  are  shown,  in  comparison  to  the  results  from  ACE, 
to  be  nearly  optimal. 


1  PRELIMINARIES 

This  paper  investigates  the  use  of  possibly  nonlinear,  regression-adjusted  control  variables 
for  variance  reduction  in  statistical  and  system  simulation.  Let  Q_  be  a  vector  of  control  variables 
which  are  correlated  with  (related  to  or  associated  with)  a  statistic  of  interest,  V'.  and  assume 
that  C  has  a  known  mean  vector  E[Q],  The  object  is  to  more  accurately  estimate  /v[Y']  by 
deriving  a  controlled  statistic  V  which  has  less  variance  than  Y .  A  standard  method  for  doing 
this  is  via  the  linear,  additive  combination  of  Y  and  the  components  of  C, 


i  In'  parameter  vector  J  is  a  vector  of  unconstrained  constants  which  are  to  be  chosen  so  as  to 
i:umrni/e  the  variance  of  V'.  Note  t fiat  some  components  of  CL  may  be  known  power  transfor¬ 
mations  of  other  components,  so  that  polynomial  control  schemes  are  included  in  formulation 
1  1  1.  Kxplirit  expressions  for  the  components  of  (3  which  minimize  the  variance  of  Y'  can  be 
found  in  terms  of  the  second  order  moments  of  Y  and  C,  and  with  these  parameters,  Y'  is  an 
unbiased  estimate  of  A  [V], 

In  particular,  consider  t  he  case  of  a  single,  additive,  linear  control  Y'  =  Y  —  0  ( C  —  E  [C] ) . 
Here  3  is  chosen  to  minimize  var(Y').  This  variance  is  minimized  when  3  is  proportional  to  the 
correlation  between  C  and  V':  t lie  greater  the  correlation,  the  greater  the  effectiveness  of  the 
control  in  obtaining  variance  reduction.  Assuming  var(Y)  =  var(C),  the  result  follows  from: 

rar(Y')  —  var(Y)  +  02var(C)  —  20cov(Y,C) 

=  vnr(Y)(\  +  32  -20p(Y.C))  . 


1'itferentiating  with  respen  to  J  and  setting  the  resulting  expression  equal  to  zero  yields  the 
optimal  value  for  3: 

3  =  p(Y.C), 


e:\ii 


In  particular. 


r«r(  >") 
rar(Y) 


P(Y,C)2. 


(2) 


100 


r«r(V' )  -  rnr(V') 
rar()') 


100  1 


var(Y')  \ 
var(Y) ) 


100p(r,C)2 


id) 


measures  the  percent 
of  equal  varia nces.  we 


variance  reduction  resulting  from  the  control, 
have 


3  = 


oy 


P(Y.C). 


Without  the  assumption 


while  (2)  still  holds.  Thus  if  p(Y,  C).  oy .  and  <Jc  are  known,  p(Y,C )  is  a  direct  measure  of 
the  variance  reduction  which  can  be  obtained  with  a  single  regression  adjusted  control.  In  fact 
comparing  correlations  is  a  method  for  choosing  between  proposed  controls. 

This  paper  generalizes  (1)  by  letting 


Y'  =  Y-C',  (4) 

where  ("  is  anv  mean  zero  linear  or  nonlinear  parametric  function  of  the  components  of  CL.  i.e. 

For  example,  C  might  involve  additive  or  multiplicative  combi¬ 
nations  of  unspecified  power  transformations  of  the  components  of  the  original  control  vector  CL 
Opt  imal  or  near-optimal  values  of  the  unknown  parameters  of  these  transformations,  analogous 
to  •>  in  (  1  ),  are  obtained  by  minimizing  the  variance  of  Y'.  However,  the  results  are  not  explicit 
functions  of  the  joint  and  higher  moments  between  Y  and  the  set  of  control  variables. 
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Now  hr  this  more  general  case  of  multiple,  possibly  nonlinear,  control  variables  we  obtain, 
using  (4), 


ear(y') 

var(Y) 


1  .  li  r(C') 

t.  'Y) 


2^1  p(Y,C) 
ay 


1  +  k7  -  2kp(Y,  C) 


(5) 


1  -  TS(F)  =  2MV.C-)  -  (6| 

where  k  is  positive  valued.  While  this  last  equation  is  simple  in  form,  both  p(Y,C')  and 
k  =  ac /<xy  are  functions  of  the  parameters  in  C' .  Thus  it  is  not  true  that  in  order  to  maximize 
the  variance  reduction  with  respect  to  the  parameters  of  the  control  function,  one  need  only 
maximize  the  absolute  value  of  the  correlation  between  Y  and  C' . 

When  C'  is  a  linear  additive  function  of  the  components  of  Q  as  in  (1),  p(Y,C')  is  a 
quadratic  function  of  the  parameters  0  whose  optimal  values  are  a  function  of  the  correlation 
matrix  of  (V'.C) ,  i.e.  the  joint  and  higher  moments  between  Y  and  the  set  of  control  variables. 
In  fact,  explicit  expressions  for  the  optimal  values  of  /?  are  known  (Rubinstein  and  Marcus, 
1985).  As  an  example,  for  two  independent  linear  controls  with  known  correlations  with  Y,  it 
follows  from  (5)  that  with  the  optimal  values  of  0, 

\-p{Y,Ctf  -p{Y,C2)2.  (7) 

Choosing  control  variables  with  maximum  correlations  with  Y  will,  in  this  case,  still  maximize 
the  reduction  in  variance. 

In  the  general  case  (4),  when  the  controls  are  not  independent,  and  ac/cry  p(Y,C) 
in  (5),  p(Y.C)  does  not  yield  an  exact  measure  of  variance  reduction  as  does  p(Y,C)  in  (3). 
Additionally,  the  allowable  range  of  parameters  may  be  constrained  in  the  function  which 
generates  C  out  of  the  components  of  C_  by  the  requirement  that  E[C'}  must  be  known, 
exactly  or  approximately,  and  must  be  finite. 


2  THE  ACE  PROGRAM 


The  ACE  (Alternating  Conditional  Expectation)  program  (Breiman  and  Freidman,  1985) 
provides  a  method  for  estimating  the  minimum  variance  obtainable  by  regressing  a  variable 
\  on  an  additive  combination  of  arbitrary  transformations  of  another  set  of  variables  such  as 
the  components  of  Q.  As  such  it  suggests  that  in  a  simulation  context,  more  control  (more 
variance  reduction)  can  be  obtained  with  transformations  of  the  chosen  control  variables.  ACE 
estimates  transformation  functions  8{Y)  and  4>](Cj)  to  minimize  the  fraction  of  the  variance  of 
E  net  explained  by  regression  (e2)  defined  as  follows: 


!  (9,  0, , . . . ,  tl>  )  = 


£[02(n] 


(S) 


r 


Distribution/ 
Avalltibiiiv  .>■  Co 
etvrtli  and/o 
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The  algorithm  uses  conditional  expectations  and  alternates  between  improving  the  estimate  of 
0(Y)  and  improving  the  estimates  of  the  ipj(Cj)s.  The  computational  mechanics  on  finite  data 
sots  of  continuous  variables  involve  the  use  of  data  smooths  to  approximate  the  conditional 
expectations  in  order  to  repeatedly  reduce  the  mean  squared  error  until  it  is  minimized. 

The  ACE  procedure  is  nonparametric,  with  the  transformations  selected  solely  on  the  ba¬ 
sis  of  the  data  sample.  Minimal  assumptions  about  the  distribution  of  the  sample  or  about 
allowable  transformations  enable  ACE  to  produce  an  estimate  of  the  minimum  mean  squared 
error  between  the  transformed  V'  variable  and  the  sum  of  the  transformed  components  of  C_. 
When  C_  has  only  one  component,  this  is  equivalent  to  maximizing  the  correlation  between  a 
transformed  Y  and  a  transformed  C. 

Unfortunately,  the  transformations  ACE  selects  cannot  be  used  to  develop  control  variables 
for  variance  reduction  since  the  transformations  are  non-parametric  and  the  true  means  of 
the  transformed  variables  cannot  be  determined.  However,  one  can  use  the  minimum  mean 
squared  error  from  ACE  to  obtain  an  upper  bound  on  the  variance  reduction  that  can  be 
achieved  between  V  and  C'  in  a  parametric  control  function  such  as  (4).  Thus,  ACE  may  be 
used  to  gauge  the  effectiveness  of  any  control  function  using  a  fixed  set  of  control  variables. 
Since  (4)  does  not  allow  for  transformations  of  the  dependent  variable,  ACE  was  intentionally 
limited  during  this  study  to  using  only  linear  transformations  of  Y . 

3  THE  SAMPLE  ANALOG  TO  THE  VARIANCE  REDUC¬ 
TION  FORMULA 

In  practice,  one  has  no  theoretical  information  about  the  joint  probability  properties  of  Y 
and  C',  or  the  joint  probability  properties  of  Y  and  the  components  of  Q_.  Instead  one  has  a 
simulation  sample  of  size  m  of  independent  replications,  { )j  £, }  :  i  =  1 , . . . ,  m,  from  which  to 
estimate  E  [V'j.  Regardless  of  whet  her  the  sample  is  large  or  small,  i.e.  is  a  pilot  sample  or  all 
of  the  simulation  data  that  will  be  available,  one  wants  to  minimize  the  sample  variance  of  Y'. 
Minimizing  th  sample  variance  involves,  after  subtracting  Y  from  both  sides  of  (4),  minimizing 

zr=.  (>;  -  y  -  cA 

- - - -  (9) 

m 

Er=i  (>',  -  f) 2  c'2  2  £r=,  (l  -  y)  c; 

- ! - U  +  ^1=1  ■ - i - L —  (10) 

mm  m 

The  left-hand  side  of  (9)  is  the  quantity  to  be  minimized  as  E  |Vj  =  E  [V'j  =  E  [V'j  since 

E  n  is  known  to  be  zero.  Thus  either  Y  or  Y'  can  be  used  in  the  estimate  of  the  variance 
of  1  '.  Equation(9)  shows  that  this  estimate  of  the  variance  of  Y'  is  equal  to  the  residual 
sum  of  squares  of  the  least  squares  regression  of  Vj  -  Y  on  C .  Equation  (10)  involves,  in  its 
first  term,  the  total  sum  of  squares,  which  estimates  the  variance  of  in  its  second  term  the 
sample  variance  of  the  zero  mean  C'\  and  in  the  last  term  the  sample  covariance  of  V  and  C . 
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Rearranging  terms  in  (10),  we  have 


xx.  fo-v)2  Er=i  (y; -  y) 2  _  2a  {y-y)c>  y^c? 


e:=,  (Yi  -  v')2  -  ETLt  (y;  -  y)2  _  2EE1  fc  -y)c,'  C;2 


EE,  (e-k)' 


ee,  (e  -  y)  Er=i  (e  -  vy 


The  left-hand  side  of  ( 1 1 )  is  the  usual  R 2  regression  measure  and  equation  (11)  may  be  rewritten 
as 

R 2  =  2^  r(KC')  -  %  =  2k  r(Y,C')  -  k2 .  (12) 

by  ^y 

As  the  sample  analog  to  (6),  (12)  indicates  that  maximizing  R2  through  nonlinear  least- 
squares  regression  is  equivalent  to  maximizing  sample  variance  reduction  when  the  optimal 
parameters  are  unknown.  Thus  for  Q_  with  multiple  components,  maximizing  the  effectiveness 
of  C  can  be  accomplished  through  estimating  the  parameters  of  C'  via  multiple  least-squares 
regression  of  Y'  —  Y  on  C  .  A  similar  result  relating  optimal  regression  and  optimal  correlation 
can  be  found  in  the  ACE  article  (Breiman  and  Friedman,  1985). 

With  linear  controls,  linear  least-squares  regression  will  provide  a  global  minimum  for  the 
residual  sum  of  squares,  in  turn  maximizing  the  variance  reduction  for  the  sample.  When  the 
control  function  is  nonlinear,  nonlinear  least-squares  regression  will  not  necessarily  determine 
parameter  values  which  globally  minimize  the  residual  sum  of  squares  since  the  function  could  be 
nonconvex.  With  a  control  function  C  =  -  E  that  is  nonconvex,  there  may¬ 

be  many  suboplimal  local  minima.  In  this  case  the  choice  of  initial  values  for  the  parameters  ^ 
in  the  nonlinear  regression  may  significantly  affect  the  amount  of  variance  reduction  obtained. 
If  one  uses  as  starting  values  for  3  the  special  values  which  represent  the  linear  case  tor  the 
control,  one  should  always  do  at  least  as  well  as  the  linear  case  regardless  of  nonconvexitv. 

One  must  be  careful  that  while  multiple  regression  may  be  computationally  useful,  the 
distribution  theory  behind  multiple  regression,  which  assumes  fixed  independent  variables,  does 
riot  apply.  Consequently,  confidence  intervals  on  parameter  estimates  cannot  be  determined 
directly  from  the  regression  results. 


4  APPROXIMATING  OPTIMAL  NONLINEAR  TRANS¬ 
FORMATIONS  FOR  NONLINEAR  CONTROLS 

Since  ACE  does  not  supply  any  parametric  clue  to  the  optimal  transformations  of  the  indi¬ 
vidual  components  of  £,  approximations  are  needed  for  these  transformations.  A  requirement 
for  the  approximations  is  that  they  contain  the  linear  additive  case  ( 1 )  as  a  special  set  of  pa¬ 
rameter  values,  thus  ensuring  that  one  attains  at  least  the  known  variance  reduction  for  this 
case.  The  approximations  studied  here  take  two  forms,  piecewise  linear  controls,  and  standard 
statistical  parametric  transformations,  used  separately  or  conjointly  on  each  component  of  C  . 
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4.1  Piecewise  Linear  Transformations  of  Controls 

Statistics  are  often  nonlinear  functions  of  the  random  variables  from  which  they  are  de¬ 
rived.  Therefore  one  might  expect  some  nonlinear  controls  to  have  a  higher  correlation  with 
V  than  linear  controls.  While  not  a  measurable  guarantee  of  improved  variance  reduction,  (G) 
suggests  that  higher  correlation  indicates  that  a  nonlinear  control  may  be  able  to  be  a  better 
“control”  than  the  linear  controls.  Given  an  initial  guess  at  a  viable  linear  control,  one  type  of 
nonlinear  control  can  be  formed  by  using  indicator  functions  and  '‘cutpoints”  to  form  piecewise 
linear  transformations  of  the  control.  Graphical  analysis  can  be  useful  in  selecting  the  initial 
cutpoint(s). 

For  example,  a  control  variable  C  is  split  into  two  control  variables  about  a  cutpoint  b  as 
follows: 

n  _  \  C  if  c  <b  _  j  C  if  C  >  b 

1  |  0  otherwise  !'  2  j  0  otherwise. 


By  judicious  choice  of  the  variable  cutpoint  b  or  perhaps  multiple  cutpoints,  least-squares 
multiple  regression  can  achieve  a  better  fit  without  the  use  of  additional  control  variables.  As 
an  example  let  .V  be  distributed  as  an  independent  Uniform  (-.5.  .5)  variate.  Let  Y  -  A’2  +  r 
where  <  is  distributed  as  independent  Normal  (0,  0.01).  With  300  samples,  using  just  A  as  a 
linear  control  as  in  (1),  linear  least-squares  regression  yielded  an  R2  of  0.00.  However  using  A 
to  form  two  new  controls  as  in  ( 13).  with  b  =  0,  yielded  an  R?  of  .92  using  linear  least -squares 
regression.  If  b  —  oc  or  b  — -  — ■».  the  ordinary  linear  control  is  obtained  Of  course,  care  must 
be  taken  in  determining  the  form  of  the  control  function  to  ensure  it  has  mean  zero,  i.e.  T[Gi] 
-.r.d  .  '  [C~\  rr-"*  Ge  kix-wn  v~*e  also  ti>*»*  t'>o  regression  is  still  linear  if  b  is  yivin,  but  it  is 
nonlinear  otherwise.  Finding  an  optimal  b  then  becomes,  in  general,  a  nonconvex.  nonlinear, 
mathematical  programming  problem. 


4.2  Transformations  of  Controls 

Several  standard  transformations  are  used  in  staiislLo  and  naJyilj  and  these  can 
be  applied  to  approximations  for  the  optimal  transformation  of  a  control  variable  C.  Power 
transformations  of  controls,  iri  addition  to  piecewise  transformations  of  controls,  introduce 
nonlinearity  into  the  controlled  estimate  of  E  [)'}  while  containing  the  untransformed  control 
as  a  special  case.  The  power  transformation  used  initially  in  this  study  is  of  the  form  7.  = 
( A’ v  —  1  )/p.  for  p  >  -1.  This  scaled  power  transformation  has  the  property  that  as  p  —  0  the 
limit  is  In  A  and  when  p  —  1  it  gives  a  shifted  version  of  the  original  variable. 

This  power  transformation  can  have  vastly  different  effects  for  A’  >  1  and  A  <  1.  The 
curves  in  Figure  1  represent  a  sample  of  possible  transformations.  As  one  increases  p.  the 
change  in  ;.he  nature  of  the  function  on  either  side  of  X  =  1  becomes  more  drastic.  For  largo 
values  of  p.  large  values  of  .A  are  given  added  weight  while  for  small  values  of  p,  the  small 
'allies  of  A  are  given  the  additional  weight.  Note  that  when  p  —  1,  this  is  simply  tin'  linear 
transformation.  Thus  optimizing  using  this  transformation  assures  variance  reduction  at  least 
as  good  as  in  the  linear  rase. 


(i 


Figure  1:  Examples  of  Power  Transformations  of  a  Variable  X. 


Using,  for  example,  the  single  control  variable  C.  the  resulting  nonlinear  control  function  is 


C  =  3 


{ 


Cp  -  1 
P 


which  has  two  parameters,  and  3.  Of  course,  combinations  of  piecewise  transformations  and 
power  transformations  are  also  possible,  and  it  is  this  combination  of  nonlinear  controls  which  is 
the  main  thrust  of  this  paper.  With  this  combination  one  hopes  to  come  close  to  the  maximum 
theoretical  variance  reduction  which  could  be  obtained. 


5  AN  EXAMPLE 

Estimating  the  mean  of  the  Anderson-Darling  goodness-of-fit  statistic,  U„.  (Anderson  and 
Darling,  1952)  provides  a  good  example  of  the  benefits  of  piecewise  internal  controls  and  power 
transformations.  The  example  is  artificial  since  E  [H’t, ]  is  known  to  be  one  for  all  n.  and 
the  determination  of  the  quantiles  is  the  real  problem.  However,  the  example  is  useful  as  an 
illust  ration. 


Tli.'  statistic  IT*  can  he  determined  as  a  function  of  n  independent  unit  exponential  random 
variables  F,  ( Lewis  and  Orav,  1987).  Note  first  that  one  can  write  IT/  as  a  function  of  order 
statistics  from  a  unit  exponential  distribution  as  follows: 
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IT*  =  -n  -  (n-1)  Y,  [-(2»  -  1)1ii{i-€-K‘*>}  +  {2(«  -  »)  +  lj  £,’(,)]  H  D 

1=1 


where  the  /■./,)  are  the  order  statistics  from  a  unit  exponential  population.  These  order  statistics 
can  in  turn  be  expressed  in  terms  of  the  n  independent  unit  exponential  random  variables  /-, 


as 


-(*) 


V — tii — . 

kln~J  +  l 


Together  (l  l  i  and  (la)  give  IT*’  as  a  function  of  n  independent  exponential  random  variables. 
Die  independence  of  these  random  variables  makes  them  ideal  for  controlling  IT*.  The  case 
n  2  is  presented  here,  for  which  (7)  holds  with  C\  -  F\  and  C2  =  F2. 

\s  mentioned  before,  graphical  methods  can  sometimes  be  useful  in  determining  types  of 
i.eitrols  or  aspects  of  conti ols.  Two  useful  plots  of  U’/  are  presented  here.  Figure  2  is  a 
'itrface  plot  of  If/  over  a  small  region  of  the  F\ ,  F.  2  plane  where  the  majority  of  values  occur. 
I  his  is  not  a  density  plot,  but  a  representation  of  the  functional  relationship  between  the  two 
independent  exponentials  and  the  If/  values  each  pair  generates.  Subsequent  surface  plots  of 
the  control  fiitu  lions  likewise  do  not  portray  density:  just  the  surface  generated  by  the  control 
function.  As  an  indicator  of  the  density  of  points  on  the  If/  surface.  Figure  3  is  a  sample 
bivariate  histogram  of  1000  independent  pairs  of  unit  exponentials.  While  one  could  plot  an 
actual  bivariate  exponential  density,  the  discrete  nature  of  the  histogram  allows  easier  visual 
comparisons  id  density,  logether.  Figures  2  and  3  indicate  why  nonlinear  controls  may  prove 
imeliil  for  controlling  IT/.  Clearly  the  relationship  between  IT/.  F.\  and  F2  is  highly  nonlinear 
suggesting  the  use  of  nonlinear  controls.  Figure  3  supports  one's  intuition  that  the  majority  of 
pairs  of  the  bivariate  exponential  are  dose  to  the  origin.  Suspecting  this  one  may  be  tempted 
to  use  a  linear  control  to  just  approximate  the  surface  in  this  region.  However  Figure  3  also 
shows  a  significant  number  of  pai rs  throughout  the  plane.  Thus  in  order  for  ("  to  be  an 
effective  control,  t lie  entire  surface  should  be  approximated  by  the  control.  This  would  require 
a  nonlinear  control  and  nonlinear  regression. 


X 


Six  different  linear  and  nonlinear  <ontrol  filiations  for  estimating  the  mean  ot  W'.f  were 
evaluated  using  a  single  sample  of  500  pairs  of  unit  exponentials  and  their  associated  U  j 
values.  The  experimental,  A  PL- based  CRAFSTAT,  from  IBM  Research,  was  used  for  all  of  the 
computing.  The  following  six  control  functions  were  compared: 


C'  =  ai(£i  -  l)  +  fo(E2-  1): 


(16) 


C'  =  (E,  -  1 )  +  lh  (Ei  -  1)  +  p3  (El  -  2)  +  fU  (Ej  -  2)  :  (17) 


C  =  ZU  h 


E’  -\ 


r, 


-  E 


where 


and 


Kf  -1 


(  '  --  d)  (  /'- 1  —  1  )+  T'2(  i-2  —  1 )  +1^3  I  — ' lysr - A 


f'  -  EL,  EL,  tijk 


E?  ->1 


f.f2-l 


/J2 


b?jk  -  \ 

i,k  L 

Fjk 

0  otherwise 


^  '  '  '  an<i  K»  =  \  \  otherwise  J  =  L2: 


(IX) 
:  ( 10) 

(20) 


here 


( '  -  Ej= i  Jjk 


f-f jk  ~\ 

j*. 

Kr->11 

r,k 

A,  <  *,i 

ot  herwise. 


= 


Ej  t'j  i  <  ^  j  -  (*1;  - 
0  otherwise 


and 


(21) 


- 


E,  Ej  >  by? 

0  otherwise 


J  =  1,2. 


As  expected.  the  simplest  controls.  (16)  and  (17).  with  straightforward  control  functions, 
were  usually  less  effective  than  the  nonlinear  controls.  The  controls  give,  by  (16)  and  ( 1  <  ) 
are  referred  to  as  the  “standard"  controls  because  their  unknown  parameters  ran  be  computed 
using  linear  least-squares  regression.  Since  the  necessary  expected  values  of  the  controls  just 
involved  the  first  two  moments  of  the  exponentials,  they  were  determined  analytically  and  not 
estimated.  The  remaining  parameters  for  controls  (Hi)  and  (17),  respectively  .f]  and  .T.  and 
J|.  ,i2.  >h  and  .T,  were  computed  using  linear  least-squares  regression.  Since  control  ( 16)  is  a 
linear  function  of  E\  and  E2  and  IT22  is  a  very  nonlinear  function  of  E\  and  A 2 ■  this  control, 
not  surprisingly,  achieved  an  H1  of  only  .2265.  This  poor  performance  could  also  be  predicted 
by  using  the  sample  estimates  for  p(Wj,E\)  and  />(H 22,  E2)  in  (7).  If  the  estimates  were  the 
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true  correlations,  the  optimal  d's  would  only  yield  a  22.66  percent  variance  reduction.  The 
parabolic  shape  of  (17),  as  shown  in  Figure  4,  enabled  the  control  function  to  achieve  an  R 2 
of  .5627.  While  better,  it  is  far  from  optimal.  Note  that  on  the  graphs  of  the  controls  the 
predicted  values  of  the  controls  are  centered  about  zero,  the  mean  of  C . 


Figure  4:  Surface  generated  by  the  parabolic  linear  control  given  in  (17).  The  control  is  linear 
since  the  powers  are  fixed. 

For  control  (J.9)  only  the  linear  terms'  expected  values  could  be  calculated  analytically.  The 
other  two  expected  values  were  functions  of  the  unknown  parameters  and  had  to  be  recalculated 
based  on  the  current  parameters  during  the  optimization.  For  controls  (18),  (20),  and  (21) 
none  of  tin-  expected  values  could  be  determined  analytically  so  all  were  calculated  during  the 
optimization.  All  of  the  parameters  for  the  nonlinear  controls  were  estimated  via  the  nonlinear 
regression  segment  of  (IRAFSTAT.  For  nonlinear  regression.  CRAFSTAT  uses  a  form  of  the 
Marquadt  algorithm  (Marquadt,  1063)  which  allows  bounds  to  be  placed  on  the  parameters. 
Lower  bounds  of  -.90  were  necessary  on  the  power  parameters,  Pj since  the  expected  values 
of  the  exponentials  (involving  the  gamma  function)  are  not  defined  for  p,k  <  —  1.  A  reasonable 
upper  bound  on  each  pjk  was  found  useful  in  speeding  convergence. 

As  the  control  functions  became  more  nonlinear,  their  effectiveness  usually  increased.  Al¬ 
lowing  the  powers  to  float  in  control  (18)  versus  being  fixed  in  control  (16)  gave  a  slight  improve¬ 
ment:  the  R2  went  from  .2265  up  to  .46-10.  This  was  not  as  good  however  as  the  “standard” 
control  (17)  with  two  linear  terms  and  two  quadratic  terms  which  achieved  an  R 2  of  .5627. 
Adding  the  two  linear  terms  to  control  ( 18)  resulted  in  control  (19).  Now  allowing  the  powers 
to  float  in  control  (  19),  versus  being  fixed  in  control  ( 17),  enabled  the  surface  to  fit  more  closely 
and  thus  the  R2  for  (19)  was  .7422.  This  definite  improvement  over  the  “standard”  controls 
can  be  seen  in  Figure  5. 


1  1 


Figure  5:  Surface  generated  by  “non-standard”  control  with  linear  and  nonlinear  terms  given 
in  (19). 


Originally,  the  outpoints  for  controls  (20)  and  (21)  were  parameters  to  be  optimized.  Unfor¬ 
tunately,  this  made  the  optimization  unstable  and  the  results  unreliable.  Thus,  the  outpoints 
were  fixed  at  selected  quantiles  and  not  included  as  parameters  in  the  nonlinear  regression. 
Selection  of  a  good  cut poi nt  was  done  by  examining  the  results  of  a  short  sequence  of  regres¬ 
sions.  For  control  (20)  a  outpoint  at  the  .5  quantile  was  the  most  effective  one  lound  for  this 
sample.  Comparing  Figure  6  to  Figure  5  shows  the  impact  of  adding  nonlinearity  by  the  use 
of  the  outpoint.  The  R2  for  control  (20)  was  .8216.  The  results  of  using  the  estimated  pa¬ 
rameters  for  (20)  on  three  independent  samples  of  1000,  Table  1,  indicate  that  even  though 
the  regression-estimated  parameters  are  biased  for  the  original  sample,  (20)  is  still  effective  in 
controlling  other  samples. 


Sample  1 

Sample  2 

Sample  3 

1 

.9882 

1.0022 

1.0219 

S? 

.0261 

.0262 

.0282 

Y* 

.9972 

1.0095 

1.0238 

SY' 

.0110 

.0124 

.0129 

R1 

.8239 

.7759 

.7905 

Table  1:  Effect  of  the  nonlinear,  single-cutpoint  control  given  in  (20)  on  three  independent 
samples  other  than  the  regression  sample. 
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1.5 


Figure  (i:  Surface  generated  by  the  nonlinear,  single-cutpoint  control  given  in  (20). 

As  the  number  of  cuf  points  increases  to  two  for  control  (21),  one  gets  a  more  effective 
control  at  the  cost  of  increased  computational  complexity.  The  computational  complexity 
increases  because  the  additional  cut-point  creates  more  parameters  and  because  the  computation 
of  expected  values  becomes  more  expensive.  As  before,  the  cutpoints  were  fixed  at  selected 
quantile  values.  Which  values  to  select  was  a  matter  of  performing  a  series  of  regressions  ori  a 
grid  of  values.  Figure  7  shows  that  some  pairs  of  cutpoints  were  better  than  others.  Figure  8 
shows  that  the  best  cutpoints  for  this  sample  on  the  grid  examined,  the  .30  and  .65  quantiles, 
yield  a  control  which  is  an  excellent  approximation  to  the  W2  surface.  The  regression  with 
these  cutpoints  on  the  original  sample  yielded  an  R2  of  .8372. 

This  last  control,  (21),  was  then  tested  on  independent  samples  and  the  R2  was  compared 
to  results  from  ACE.  Table  2  indicates  the  results  for  three  samples  of  1000  W2  values.  Again 
the  R2  values  are  almost  as  good  as  the  original  sample,  and  the  control  is  effective  in  all  three 
cases.  ACE  was  given  the  data  generated  by  using  the  cutpoints  on  the  original  sample  as  the 
independent  variables.  The  R2  value  derived  by  ACE  was  .8560  showing  that  control  (21)  is 
nearly  optimal  for  the  control  variables  used. 
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Figure  7:  Effects  of  ('hanging  the  Cutpoints  on  Correlation 


Figure  8:  Surface  generated 


Sample  1 

Sample  2 

Sample  3 

V 

.9531 

1.0230 

.9842 

*Y 

.0230 

.0265 

.0255 

y 

.9997 

1.0157 

1 .0001 

Sy, 

.0094 

.0121 

.0110 

It2  j  .S3 50 

.7925 

.8153 

Table  2:  F.  fleet  of  the  nonlinear,  double-cutpoint  control  given  in  (21)  on  three  independent 
sample  other  than  the  sample  used  for  regression. 

6  SUMMARY  AND  CONCLUSIONS 

This  study  demonstrates  the  potential  effectiveness  of  nonlinear  regression-adjusted  controls 
in  reducing  variance  in  simulations.  Various  piecewise  linear  and  power  transformations  were 
shown  to  l)e  useful  in  developing  control  functions.  There  are  many  questions  yet  to  bo  answered 
though.  Some  areas  of  investigation  are  listed. 

1 .  Finding  controls  for  the  variance,  percentiles  and  quantiles  of  IVj .  Once  a  suitable  control 
function  is  developed,  ran  it  be  used  with  different  parameters  for  other  aspects  of  tin- 
data  ? 

2.  f  inding  controls  for  IF*  for  n  >  2.  As  the  dimensionality  increases,  one  may  not  need 
every  independent  variable  in  the  control  function  to  get  effective  control.  Measures  of 
influence  or  leverage  could  possibly  be  used  to  reduce  the  size  of  the  control  function. 

3.  Fsing  otiier  transformations  such  as 

(a)  X  =  (>  v  -  l)  /-. 

(b)  7.  —  ,V;’ -  l)(f'vV  -  ijj  //)->.  or 

(c)  Z  =  _  i)  /7. 

I  hese  transformations  represent  a  broad  spectrum  of  transformations  on  a  variable  as  ran 
be  seen  in  Figures  9,  10  and  11.  Note  also  that  transformation  (3a)  and  transformation 
(3c)  contain  the  linear  case  as  a  special  set  of  parameter  values.  The  first  transformation. 
(3a).  is  a  positive  weighting  of  all  values,  with  large  values  weighted  more  than  small 
values.  By  varying  the  7  parameter,  one  can  scale  the  effects  of  the  weights  from  very 
large  for  large  7  to  very  slight  for  small  7.  The  second  transformation.  (3b),  applies 
small  negative  weights  for  values  less  than  1.  For  values  larger  than  1  it  allows  for  a  wide 
range  of  positive  weighting  schemes  as  in  transformation  (3a).  The  third  transformation. 
(3c),  is  similar  to  the  straightforward  power  transformation,  (Figure  1),  but  with  mon- 
parameters.  Thus  it  allows  for  more  flexibility  and  increased  curvature  for  smaller  values 
of  the  parameters.  The  difficult  part  with  these  transformations,  as  usual,  is  computing 
the  necessary  expected  values. 


15 


0.5 


1.0 


1.6 


2.0 


X 

Figure  9:  Transformation  (3a)  applied  to  a  Variable  X. 


7  =  .01  p  =.01  f 

-/  =  QQ  r>  =  01  ' 


0.5  1.0  1.6  2.0 

X 

Figure  10:  Transformation  (3b)  applied  to  a  Variable  X. 
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Figure  11:  Transformation  (3c)  applied  to  a  Variable  X. 


•I.  Using  similar  controls  for  gamma  family  statistics  such  as  those  encountered  in  queuing 
problems.  Preliminary  results  with  internal  control  of  a  regenerative  simulation  estimate 
of  the  waiting  time  of  the  vih  customer  in  an  M/M/1  queue  (Iglehart  and  Lewis,  1979) 
indicate  that  allowing  non  integer  powers  in  the  control  function  can  substantially  increase 
the  effectiveness  of  the  control.  Oti  a  sample  of  20.090  busy  periods,  with  p  =  .5  and  p  =  l, 
a  linear  control  function  obtained  an  R 2  of  .59  while  a  nonlinear  control  function  obtained 
an  R2  of  .69. 

5.  Investigating  problems  with  estimating  the  variance  of  the  variance-reduced  estimate  of 
L'[V],  This  is  a  very  difficult  problem  for  which  sectioning  or  bootstrapping  may  be 
needed. 
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